Querying a pandas DataFrame

Overview:

  • The DataFrame class of pandas library provides multiple ways to query data from a DataFrame instance.
  • To get a single value from a DataFrame using the indexes and labels, the properties at and iat are used. To get a group of values by specifying the indexes and labels the properties loc and iloc are used.
  • To pop a column from a DataFrame, the method pop() is used.
  • To get the first "n" rows and the last "n" rows of the DataFrame, the head() and tail() methods are used.
  • To query the data based on Boolean expressions(including specific customizations) methods like query() and asof() are used.
  • To iterate over the DataFrame, the method items(), iteritems() are used.

Example 1 - Querying a single value from a  pandas DataFrame using indices and labels:

# Example Python program to query a value
# using the index
import pandas as pds

# Create a DataFrame
lattice = [(-1, -1, -1),
            (-1, 0, -1),
            (-1, -1, -10)];
df = pds.DataFrame(lattice);
print("DataFrame:");
print(df);

# Add index and column labels
df.index     = ["r1", "r2", "r3"];
df.columns     = ["c1", "c2", "c3"];

# Query the value from a location using the indices
row = 1;
col = 1;
val = df.iat[row, col];
print("Value at the indexes %d,%d:"%(row, col));
print(val);

# Query the value from a location using labels
row = "r3";
col = "c3";
val = df.at[row, col];
print("Value at the labels %s,%s:"%(row, col));
print(val);

Output:

DataFrame:
   0  1   2
0 -1 -1  -1
1 -1  0  -1
2 -1 -1 -10
Value at the indexes 1,1:
0
Value at the labels r3,c3:
-10

Example 2 - Querying a group of values from a  pandas DataFrame using indices and labels:

# Example Python program that queries
# a group of values using the indexes and labels
import pandas as pds

tokens = [("ab", "bc", "cd"),
          ("de", "ef", "fg"),
          ("gh", "hi", "ij")];
df = pds.DataFrame(data = tokens);

df.index     = ["r1", "r2", "r3"];
df.columns     = ["x", "y", "z"];
print("DataFrame:");
print(df);

# Query different groups of values using indices
r1 = 0;
r2 = 0;
c1 = 1;
c2 = 1;
print("Query results using indices [%d, %d], [%d, %d]:"%(r1, r2, c1, c2));
print(df.iloc[[r1,r2], [c1,c2]]);

r1 = 0;
r2 = 1;
c1 = 1;
c2 = 2;
print("Query results using indices [%d, %d], [%d, %d]:"%(r1, r2, c1, c2));
print(df.iloc[[0,1], [1,2]]);

# Query different groups of values using labels
r1 = "r1";
r2 = "r2";
c1 = "x";
c2 = "z";
print("Query results using labels [\"%s\", \"%s\"], [\"%s\", \"%s\"]:"%(r1, r2, c1, c2));
print(df.loc[["r1","r2"], ["x","z"]]);

Output:

DataFrame:
     x   y   z
r1  ab  bc  cd
r2  de  ef  fg
r3  gh  hi  ij
Query results using indices [0, 0], [1, 1]:
     y   y
r1  bc  bc
r1  bc  bc
Query results using indices [0, 1], [1, 2]:
     y   z
r1  bc  cd
r2  ef  fg
Query results using labels ["r1", "r2"], ["x", "z"]:
     x   z
r1  ab  cd
r2  de  fg

Example 3 - Pop a column from a DataFrame:

# Example Python program that pops a value
# from a DataFrame
import pandas as pds

# Create a pandas DataFrame
vals = [("I","II","III"),
        ("IV","V","VI"),
        ("VII","VIII","IX")];

df = pds.DataFrame(data=vals);
df.columns = ["c1", "c2", "c3"];
df.index = ["r1", "r2", "r3"];
print("DataFrame:");
print(df);

# Remove a column
popped = df.pop("c2");
print("Removed column:");
print(popped);
print(type(popped))

print("DataFrame after popping a column:");
print(df);

Output:

DataFrame:
     x   y   z
r1  ab  bc  cd
r2  de  ef  fg
r3  gh  hi  ij
Query results using indices [0, 0], [1, 1]:
     y   y
r1  bc  bc
r1  bc  bc
Query results using indices [0, 1], [1, 2]:
     y   z
r1  bc  cd
r2  ef  fg
Query results using labels ["r1", "r2"], ["x", "z"]:
     x   z
r1  ab  cd
r2  de  fg

Example 4 - Querying the DataFrame using a Boolean expression:

  • The query() method accepts a string containing a Boolean expression. Typically a column name is used against a value or a variable containing a value. To refer to a variable inside the Boolean expression, the symbol @ is prepended to the variable name.

# Example Python program that queries
# a pandas DataFrame using a Boolean
# expression
import pandas as pds

fractions = [(1.1, 1.2, 1.3),
               (1.4, 1.5, 1.6),
               (1.7, 1.8, 1.9)];

# Create a DataFrame
df = pds.DataFrame(data=fractions,
                   columns=["c1", "c2", "c3"],
                   index = ["r1", "r2", "r3"]);    

print("DataFrame:");
print(df);

# Query the rows where a column value exceeds a threshold
threshold = 1.1;
queryResults = df.query("c1 > @threshold");
print("Query results:");
print(queryResults);

Output:

DataFrame:
     c1   c2   c3
r1  1.1  1.2  1.3
r2  1.4  1.5  1.6
r3  1.7  1.8  1.9
Query results:
     c1   c2   c3
r2  1.4  1.5  1.6
r3  1.7  1.8  1.9

Example 5 - Querying DateTime indexed DataFrame for Non-None rows:

# Example python program that queries
# non-none rows of a DataFrame using DateTimeIndex
import pandas as pds

# Prices of commodities
prices = {"Commodity 1": [110.1, None, 109.6, 105.5, 104.2, 107.0],
          "Commodity 2": [250.1, 250.0, 245.1, None, 249.5, 252.5],
          "Commodity 3": [310.1, 320.0, 312.5, None, 315.2, 309.1]
         }

# Create a pandas DataFrame
df = pds.DataFrame(data = prices);
dateTimes = ['2020-08-05 12:31:10',
             '2020-08-05 12:32:11',
             '2020-08-05 12:33:12',
             '2020-08-05 12:34:13',
             '2020-08-05 12:35:14',
             '2020-08-05 12:35:15',
             ];

# Create an index of DateTime instances
dIndex = pds.DatetimeIndex(dateTimes);

# Attach the index to the DataFrame
df.index = dIndex;             
print(df);

# Query the rows that has non-none values between
# t0 and t4
df1 = df.asof(pds.DatetimeIndex(['2020-08-05 12:31:10',
                                    '2020-08-05 12:35:14']));
# Print the query results
print(df1);

Output:

                                      Commodity 1  Commodity 2  Commodity 3
2020-08-05 12:31:10        110.1        250.1        310.1
2020-08-05 12:32:11          NaN        250.0        320.0
2020-08-05 12:33:12        109.6        245.1        312.5
2020-08-05 12:34:13        105.5          NaN          NaN
2020-08-05 12:35:14        104.2        249.5        315.2
2020-08-05 12:35:15        107.0        252.5        309.1
                     Commodity 1  Commodity 2  Commodity 3
2020-08-05 12:31:10        110.1        250.1        310.1
2020-08-05 12:35:14        104.2        249.5        315.2

Copyright 2024 © pythontic.com